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Abstract 

In  this  work,  we  consider  a  generalized  fault  model  that  can  be  used  to  represent  a  wide  range 
of  failure  scenarios,  including  correlated  failures  and  non-uniform  node  reliabilities.  This  fault 
model  is  general  in  the  sense  that  fault  models  studied  in  prior  related  work,  such  as  /-total  and 
/-local  models,  are  special  cases  of  the  generalized  fault  model.  Under  the  generalized  fault 
model,  we  explore  iterative  approximate  Byzantine  consensus  (I ABC)  algorithms  in  arbitrary 
directed  networks.  We  prove  a  necessary  and  sufficient  condition  for  the  existence  of  I  ABC 
algorithms.  The  use  of  the  generalized  fault  model  helps  to  gain  a  better  understanding  of 
I  ABC  algorithms. 
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1  Introduction 


Dolev  et  al.  [4]  introduced  the  notion  of  approximate  Byzantine  consensus  by  relaxing  the  requirement 
of  exact  consensus  [12].  The  goal  in  approximate  consensus  is  to  allow  the  fault-free  nodes  to  agree 
on  values  that  are  approximately  equal  to  each  other  (and  not  necessarily  exactly  identical).  In 
presence  of  Byzantine  faults,  while  exact  consensus  is  impossible  in  asynchronous  systems  [5], 
approximate  consensus  is  achievable  [4].  The  notion  of  approximate  consensus  is  of  interest 
in  synchronous  systems  as  well,  since  approximate  consensus  can  be  achieved  using  distributed 
algorithms  that  do  not  require  complete  knowledge  of  the  network  topology  [1].  The  rest  of  the 
discussion  in  this  paper  assumes  a  synchronous  systems. 

The  fault  model  assumed  in  much  of  the  work  on  Byzantine  consensus  allows  up  to  /  Byzantine 
faulty  nodes  in  the  network.  We  will  refer  to  this  fault  model  as  the  "/-total"  fault  model  [16, 10, 
4, 12].  In  prior  work,  other  fault  models  have  been  explored  as  well.  For  instance,  in  the  "/-local" 
fault  model,  up  to  /  neighbors  of  each  node  in  the  network  may  be  faulty  [8,  2,  16],  and  in  the 
/-fraction  model  [16],  up  to  /  fraction  of  the  neighbors  of  each  node  may  be  faulty  In  this  paper, 
we  consider  a  generalized  fault  model  (to  be  described  in  the  next  section).  The  generalized  fault 
model  specifies  a  "fault  domain",  which  is  a  collection  of  feasible  fault  sets  (a  similar  fault  model  is 
recently  presented  in  [9]).  For  example,  in  a  system  consisting  of  four  nodes,  namely,  nodes  1, 2, 3 
and  4,  the  fault  domain  could  be  specified  as  T  =  { [1],  {2,3,4} }.  Thus,  in  this  case,  either  node  1 
may  be  faulty,  or  any  subset  of  nodes  in  [2, 3, 4}  may  be  faulty  However,  node  1  may  not  be  faulty 
simultaneously  with  another  node.  The  new  fault  model  is  general  in  the  sense  that  the  other  fault 
models  studied  in  the  literature,  such  as  /-total,  /-local  and  /-fraction  models,  are  special  cases  of 
the  generalized  fault  model. 

Analysis  of  consensus  under  the  generalized  fault  model  offers  some  new  insights  into  how 
the  choice  of  the  fault  model  affects  algorithm  design.  In  particular,  we  consider  "iterative"  algo¬ 
rithms  for  achieving  approximate  Byzantine  consensus  in  synchronous  point-to-point  networks 
that  are  modeled  by  arbitrary  directed  graphs.  The  iterative  approximate  Byzantine  consensus  (IABC) 
algorithms  of  interest  have  the  following  properties,  which  we  will  soon  state  more  formally: 

•  Initial  state  of  each  node  is  equal  to  a  real-valued  input  provided  to  that  node. 

•  Validity  condition:  After  each  iteration  of  an  IABC  algorithm,  the  state  of  each  fault-free  node 
must  remain  in  the  convex  hull  of  the  states  of  the  fault-free  nodes  at  the  end  of  the  previous 
iteration. 

•  Convergence  condition:  For  any  e  >  0,  after  a  sufficiently  large  number  of  iterations,  the  states 
of  the  fault-free  nodes  are  guaranteed  to  be  within  e  of  each  other. 

This  paper  is  a  generalization  of  our  recent  work  on  IABC  algorithms  under  the  /-total  fault 
model  [14, 13].  The  contributions  of  this  paper  are  as  follows: 

•  We  identify  a  necessary  condition  on  the  communication  graph  for  the  existence  of  a  correct 
IABC  algorithm  under  the  generalized  fault  model  (Sections  3  and  4). 

•  We  introduce  a  new  IABC  algorithm  for  the  generalized  fault  model  (Section  5)  that  uses 
only  "local"  information. 

•  A  transition  matrix  representation  of  the  new  IABC  algorithm  is  presented  (Section  6).  This 
representation  is  then  used  to  prove  the  correctness  of  the  proposed  algorithm  (Section  6.3). 
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Since  the  results  here  generalize  our  prior  results  [14, 13],  naturally  the  proof  techniques  used  here 
have  some  similarities  to  the  prior  work.  The  material  in  Section  6.3  bears  the  strongest  similarity 
to  our  prior  work.  The  rest  of  the  paper,  however,  presents  results  that  provide  new  intuition  on 
the  problem  of  approximate  consensus.  In  particular,  materials  in  Sections  4  and  5  shed  light  on 
how  the  fault  model  influences  the  design  of  IABC  algorithms. 


2  Models 

Communication  Model:  The  system  is  assumed  to  be  synchronous.  The  communication  network 
is  modeled  as  a  simple  directed  graph  G(fV,£),  where  *V  =  [1  is  the  set  of  n  nodes,  and  £ 

is  the  set  of  directed  edges  between  the  nodes  in  fV.  We  assume  that  n  >  2,  since  the  consensus 
problem  for  n  =  1  is  trivial.  Node  i  can  reliably  transmit  messages  to  node  j  if  and  only  if  the 
directed  edge  (/,  j)  is  in  £.  Each  node  can  send  messages  to  itself  as  well,  however,  for  convenience, 
we  exclude  self-loops  from  set  £.  That  is,  (/,  i)  £  for  i  e  <V.  With  a  slight  abuse  of  terminology, 
we  will  use  the  terms  edge  and  link  interchangeably  in  our  presentation. 

For  each  node  i,  let  NT  be  the  set  of  nodes  from  which  i  has  incoming  edges.  That  is,  N~  = 
{ j  |  ( j ,  i)  6  £ }.  Similarly,  define  N+  as  the  set  of  nodes  to  which  node  i  has  outgoing  edges.  That 
is,  N+  =  { j  \  ( i,j )  £  £}.  Nodes  in  Nr  and  N+  are,  respectively,  said  to  be  incoming  and  outgoing 
neighbors  of  node  i.  Since  we  exclude  self-loops  from  £,  i  £  Nr  and  i  £  Nt.  However,  we  note 
again  that  each  node  can  indeed  send  messages  to  itself. 


Generalized  Byzantine  Failure  Model:  We  consider  the  Byzantine  failure  model,  with  possible 
faulty  nodes  specified  using  a  "fault  domain"  T  (defined  below).  A  faulty  node  may  misbehave 
arbitrarily.  Possible  misbehavior  includes  transmitting  incorrect  and  mismatching  (or  inconsistent) 
messages  to  different  neighbors.  The  faulty  nodes  may  collaborate  with  each  other.  Moreover, 
the  faulty  nodes  are  assumed  to  have  a  complete  knowledge  of  the  execution  of  the  algorithm, 
including  the  states  of  all  the  nodes,  the  algorithm  specification,  and  the  network  topology. 

The  generalized  fault  model  is  characterized  using  fault  domain  T  c  2'v  as  follows:  Nodes  in 
set  F  may  fail  during  an  execution  of  the  algorithm  only  if  there  exists  set  P  £  T  such  that  FcF. 
Set  F  is  then  said  to  be  a  feasible  fault  set. 

Definition  1  Set  F  c  *V  is  said  to  be  a  feasible  fault  set,  if  there  exists  F*  e  T  such  that  F  c  F*. 


Thus,  each  set  in  P  specifies  nodes  that  may  all  potentially  fail  during  a  single  execution  of  the 
algorithm  (a  similar  fault  model  is  also  considered  in  [9]).  This  feature  can  be  used  to  capture  the 
notion  of  correlated  failures.  For  example,  consider  a  system  consisting  of  four  nodes,  namely, 
nodes  1, 2,  3,  and  4.  Suppose  that 

P  =  {{1},{2},{3,4}} 

This  definition  of  P  implies  that  during  an  execution  either  (i)  node  1  may  fail,  or  (ii)  node  2  may 
fail,  or  (iii)  any  subset  of  {3,4}  may  fail,  and  no  other  combination  of  nodes  may  fail  (e.g.,  nodes  1 
and  3  cannot  both  fail  in  a  single  execution).  In  this  case,  the  reason  that  the  set  {3, 4}  is  in  the  fault 
domain  may  be  that  the  failures  of  nodes  3  and  4  are  correlated. 

The  generalized  fault  model  is  also  useful  to  capture  variations  in  node  reliability.  For  instance, 
in  the  above  example,  nodes  1  and  2  may  be  more  reliable  than  nodes  3  and  4.  Therefore,  while 
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simultaneous  failure  of  nodes  3  and  4  may  occur,  simultaneous  failure  of  nodes  1  and  2  is  less 
likely  Therefore,  {1,2}  g  F . 

Local  knowledge  ofF:  To  implement  our  IABC  Algorithm  presented  in  Section  5,  it  is  sufficient  for 
each  node  i  to  know  Nr  n  F,  for  each  feasible  fault  set  F.  In  other  words,  each  node  only  needs  to 
know  the  set  of  its  incoming  neighbors  that  may  fail  simultaneously  Thus,  the  iterative  algorithm 
can  be  implemented  using  only  "local"  information  regarding  F. 


3  Iterative  Approximate  Byzantine  Consensus  (IABC)  Algorithms 

In  this  section,  we  describe  the  structure  of  the  IABC  algorithms  of  interest,  and  state  the  validity 
and  convergence  conditions  that  they  must  satisfy 

Each  node  i  maintains  state  Vj,  with  v,[t]  denoting  the  state  of  node  i  at  the  end  of  the  t- th 
iteration  of  the  algorithm.  Initial  state  of  node  i,  v,[ 0],  is  equal  to  the  initial  input  provided  to  node 

i.  At  the  start  of  the  f-th  iteration  (t  >  0),  the  state  of  node  i  is  Vj[t  -  1].  The  IABC  algorithms  of 
interest  will  require  each  node  i  to  perform  the  following  three  steps  in  iteration  t  where  t  >  0. 
Note  that  the  faulty  nodes  may  deviate  from  this  specification. 

1.  Transmit  step:  Transmit  current  state,  namely  Vj[t  -  1],  on  all  outgoing  edges  and  self-loop  (to 
nodes  in  N+  and  node  i  itself). 

2.  Receive  step:  Receive  values  on  all  incoming  edges  and  self-loop  (from  nodes  in  Nr  and  itself). 
Denote  by  n[t]  the  vector  of  values  received  by  node  i  from  its  incoming  neighbors  and  itself. 
The  size  of  vector  r,[f]  is  |Nr|  +  1. 

3.  Update  step:  Node  i  updates  its  state  using  a  transition  function  Z,  as  follows.  Z/  is  a  part  of 
the  specification  of  the  algorithm,  and  takes  the  vector  r,[f]  as  the  input. 

vi[t]  =  z,  ( n[t] )  (l) 

The  following  conditions  must  be  satisfied  by  an  IABC  algorithm  when  the  set  of  faulty  nodes  (in 
a  given  execution)  is  F: 

•  Validity:  it  >  0,  and  all  fault-free  nodes  i  e*V  -  F, 

Vj[t]  >  min  jc/v-r  Vj[t  -  1]  and  Vi[t ]  <  maxje^_r  Vj[t-  l].1 

•  Convergence:  for  all  fault-free  nodes  i,  j  e'V  -  F,  lim  (_><*,  (<y[f]  -  Vj[t])  =  0 

An  IABC  algorithm  is  said  to  be  correct  if  it  satisfies  the  above  validity  and  convergence 
conditions  in  the  given  graph  GW,  £).  For  a  given  fault  domain  F  for  graph  GW ,  <S),  the  objective 
here  is  to  identify  the  necessary  and  sufficient  conditions  for  the  existence  of  a  correct  IABC 
algorithm. 


4  Necessary  Condition 

In  this  section,  we  develop  a  necessary  condition  for  the  existence  of  a  correct  IABC  algorithm. 
The  necessary  condition  will  be  proved  to  be  also  sufficient  in  Section  6. 

1For  sets  X  and  Y,X—Y  contains  elements  that  are  in  X  but  not  in  Y.  That  is,  X  -  Y  =  [i  \  i  e  X,  i  $  Y}. 
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4.1  Preliminaries 


To  facilitate  the  statement  of  the  necessary  condition,  we  first  introduce  the  notions  of  "source 
component"  and  "reduced  graph"  using  the  following  three  definitions. 

Definition  2  Graph  Decomposition:  Let  H  be  a  directed  graph.  Partition  graph  H  into  strongly 
connected  components,  H\,H2,  ■  ■  ■  ,Hju  where  h  is  a  non-zero  integer  dependent  on  graph  H,  such  that 

•  every  pair  of  nodes  within  the  same  strongly  connected  component  has  directed  paths  in  H  to  each 
other,  and 

•  for  each  pair  of  nodes,  say  i  and  j,  that  belong  to  two  different  strongly  connected  components,  either 
i  does  not  have  a  directed  path  to  j  in  H,  or  j  does  not  have  a  directed  path  to  i  in  H. 

Construct  a  graph  Hd  wherein  each  strongly  connected  component  Hk  above  is  represented  by  vertex  ck,  and 
there  is  an  edge  from  vertex  c k  to  vertex  c;  if  and  only  if  the  nodes  in  Hk  have  directed  paths  in  H  to  the  nodes 
in  Hi.  Hd  is  called  the  decomposition  graph  ofH. 


It  is  known  that  for  any  directed  graph  H,  the  corresponding  decomposition  graph  Hd  is  a  directed 
acyclic  graph  (DAG)  [3]. 

Definition  3  Source  Component:  Let  Hbea  directed  graph,  and  let  Hd  be  its  decomposition  graph  as  per 
Definition  2.  Strongly  connected  component  Hk  ofH  is  said  to  be  a  source  component  if  the  corresponding 
vertex  ck  in  Hd  is  not  reachable  from  any  other  vertex  in  Hd. 

Definition!  Reduced  Graph:  For  a  given  graph  G{fV,£ )  and  a  feasible  fault  set  F,  a  reduced  graph 
GfC’T'f/Sf)  is  obtained  as  follows: 

•  Node  set  is  obt anted  as  'Vp  =  *V  -  F. 

•  For  each  node  i  e  'Vf,  a  feasible  fault  set  Fx(i)  is  chosen,  and  then  the  edge  set  Sp  is  obtained  asfollozvs: 

-  remove  from  £  all  the  links  incident  on  the  nodes  in  F,  and 

-  for  each  i  e  'Vf  and  each  j  e  Fx(i)  n  *Vp  n  NT,  remove  link  (j,  i)from  £. 

Feasible  f aid t  sets  Fx(i)  and  Fx(j)  chosen  for  i  t  j  may  or  may  not  be  identical. 

Note  that  for  a  given  G('V,£)  and  a  given  F,  multiple  reduced  graphs  Gf  may  exist,  depending 
on  the  choice  of  Fx  sets  above. 

4.2  Necessary  Condition 

For  a  correct  IABC  algorithm  to  exist,  the  network  graph  G('V,£)  must  satisfy  the  necessary 
condition  stated  in  Theorem  1  below. 

Theorem  1  Suppose  that  a  correct  IABC  algorithm  exists  for  G('V,£).  Then,  any  reduced  graph  Gf, 
corresponding  to  any  feasible  fault  set  F,  must  contain  exactly  one  source  component. 
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Proof  Sketch:  A  complete  proof  is  presented  in  Appendix  A.  The  proof  is  by  contradiction.  Let 
us  assume  that  a  correct  IABC  algorithm  exists,  and  for  some  feasible  fault  set  F,  and  feasible  sets 
Fx(i)  for  each  i  e  *V  —  F,  the  resulting  reduced  graph  contains  two  source  components.  Let  L  and 
R  denote  the  nodes  in  the  two  source  components,  respectively.  Thus,  L  and  R  are  disjoint  and 
non-empty.  Let  C  =  (*V  -  F  -  L  -  R)  be  the  remaining  nodes  in  the  reduced  graph.  C  may  or  may 
not  be  non-empty.  Assume  that  the  nodes  in  F  (if  non-empty)  are  all  faulty,  and  all  the  nodes  in 

L,  R,  and  C  (if  non-empty)  are  fault-free.  Suppose  that  each  node  in  L  has  initial  input  equal  to 
m,  each  node  in  R  has  initial  input  equal  to  M,  where  M  >  m,  and  each  node  in  C  has  an  input 
in  the  range  [m,M\.  As  elaborated  in  Appendix  A,  the  faulty  nodes  can  behave  in  such  a  manner 
that,  in  each  iteration,  nodes  in  L  and  R  are  forced  to  maintain  their  updated  state  equal  to  m  and 

M,  respectively,  so  as  to  satisfy  the  validity  condition.  This  ensures  that,  no  matter  how  many 

iterations  are  performed,  the  convergence  condition  cannot  be  satisfied.  □ 


5  Algorithm  1 

We  will  prove  that  there  exists  an  IABC  algorithm  -  particularly  Algorithm  1  below  -  that  satisfies 
the  validity  and  convergence  conditions  provided  that  the  graph  CRV,  <S)  satisfies  the  necessary 
condition  in  Theorem  1.  This  implies  that  the  necessary  condition  in  Theorem  1  is  also  sufficient. 
Algorithm  1  has  the  three-step  structure  described  in  Section  3.  This  algorithm  is  a  generalization  - 
to  accommodate  the  generalized  fault  model  -  of  iterative  algorithms  that  were  analyzed  in  prior 
work  [4,  12,  7,  11],  including  in  our  own  prior  work  as  well  [14,  13].  The  key  difference  from 
previous  algorithms  is  in  the  Update  step  below. 
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Algorithm  1 


1.  Transmit  step:  Transmit  current  state  Vj[t  -  1]  on  all  outgoing  edges  and  self-loop. 

2.  Receive  step:  Receive  values  on  all  incoming  edges  and  self-loop.  These  values  form  vector 
Ti[t]  of  size  | Nr |  + 1  (including  the  value  from  node  i  itself).  When  a  fault-free  node  expects  to 
receive  a  message  from  an  incoming  neighbor  but  does  not  receive  the  message,  the  message 
value  is  assumed  to  be  equal  to  some  default  value. 

3.  Update  step:  Sort  the  values  in  r,[f]  in  an  increasing  order  (breaking  ties  arbitrarily).  Let  D  be 
a  vector  of  nodes  arranged  in  an  order  "consistent"  with  r,[t]:  specifically,  D(l)  is  the  node 
that  sent  the  smallest  value  in  r/[t],  D( 2)  is  the  node  that  sent  the  second  smallest  value  in 
r,[f],  and  so  on.  The  size  of  vector  D  is  also  |N7|  +  1. 

From  vector  r,[f],  eliminate  the  smallest  f\  values,  and  the  largest  f2  values,  where  f\  and  /2 
are  defined  as  follows: 


•  /i  is  the  largest  number  such  that  there  exists  a  feasible  fault  set  F'  c  N-  containing 
nodes  D(1),D(2),  ...,D(/i).  Recall  that  i  £  Nr. 

•  f2  is  the  largest  number  such  that  there  exists  a  feasible  fault  set  F"  c  Nr  containing 
nodes  D(|Nr|  -  f2  +  2),D(|Nr|  -  f2  +  3),  ...,D(|Nr|  +  1). 

F'  and  F"  above  may  or  may  not  be  identical. 

Let  N*[t]  denote  the  set  of  nodes  from  whom  the  remaining  |Nr|  +  1  -  f\  -  f2  values  in  r,[f] 
were  received,  and  let  Wj  denote  the  value  received  from  node  j  e  N*[t],  Note  that  i  e  N*[t]. 
Hence,  for  convenience,  define  W{  =  Vj[t  -  1]  to  be  the  value  node  i  "receives"  from  itself. 
Observe  that  if  j  e  N*[f]  is  fault-free,  then  ivj  =  Vj[t  -  1]. 

Define 

Vj[t]  =  Zi(n[t ])  =  ^  aizvj  (2) 

where 

1  1 

ai  ~  \N*m  ~  |Nr|  +  1  -  /i  -  f2 


The  "weight"  of  each  term  on  the  right-hand  side  of  (2)  is  a,,  and  these  weights  add  to  1. 
Also,  0  <  a,  <  1 .  Although  fi,f2  and  a,-  may  be  different  for  each  iteration  t,  for  simplicity,  we 
do  not  explicitly  represent  this  dependence  on  t  in  the  notations. 


Observe  f\  +  f2  nodes  whose  values  are  eliminated  in  the  Update  step  above  are  all  in  Nr.  Thus, 
the  above  algorithm  can  be  implemented  by  node  i  if  it  knows  which  of  its  incoming  neighbors 
may  fail  simultaneously;  node  i  does  not  need  to  know  the  entire  fault  domain  T  as  such. 

The  main  difference  between  the  above  algorithm  and  IABC  algorithms  in  prior  work  is  in  the 
choice  of  the  values  eliminated  from  vector  n[t]  in  the  Update  step.  The  manner  in  which  the  values 
are  eliminated  ensures  that  the  values  received  from  nodes  D(/i  +  1)  and  D(|Nr|  -  f2  +  1)  (i.e.,  the 
smallest  and  largest  values  that  survive  in  r,[f])  are  within  the  convex  hull  of  the  state  of  fault-free 
nodes,  even  if  nodes  D(/i  +  1)  and  D(|Nr|  -  f2  +  1)  may  not  be  fault-free.  This  property  is  useful  in 
proving  algorithm  correctness  (as  discussed  below). 
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6  Sufficiency 


We  will  show  that  Algorithm  1  satisfies  validity  and  convergence  conditions,  provided  that  G(fV,  £) 
satisfies  the  condition  below,  which  matches  the  necessary  condition  stated  in  Theorem  1. 
Sufficient  condition:  Any  reduced  graph  Gp  corresponding  to  any  feasible  fault  set  F  contains  exactly  one 
source  component. 

In  the  rest  of  this  section,  we  assume  that  G(fV ,  T)  satisfies  the  above  condition.  To  prove  its 
sufficiency,  we  first  develop  a  transition  matrix  representation  of  the  Update  step  in  Algorithm  1. 

6.1  Transition  Matrix  Representation 

In  our  discussion  below,  M[f]  is  a  square  matrix,  M;[f]  is  the  z'-th  row  of  the  matrix,  and  M(/[f]  is 
the  element  at  the  intersection  of  the  z-th  row  and  j- th  column  of  M[f]. 

For  a  given  execution  of  Algorithm  1,  let  F  denote  the  actual  set  of  faulty  nodes  in  that  execution. 
Let  |F|  =  xp.  Without  loss  of  generality,  suppose  that  nodes  1  through  (n  -  xp)  are  fault-free,  and  if 
ip  >  0,  nodes  (n  —  ip  +  1)  through  n  are  faulty.  Denote  by  z>[0]  the  column  vector  consisting  of  the 
initial  states  of  all  the  fault-free  nodes.  Denote  by  v[t],  where  t  >  1,  the  column  vector  consisting 
of  the  states  of  all  the  fault-free  nodes  at  the  end  of  the  t- th  iteration.  The  z’-th  element  of  vector  v[t] 
is  state  u,[f].  The  size  of  vector  v[t ]  is  (n  -  f). 

We  will  show  that  the  iterative  update  of  the  state  of  a  fault-free  node  i  (1  <  i  <  n  —  ip)  performed 
in  (2)  in  Algorithm  1  can  be  expressed  using  the  matrix  form  below. 

Vf[t]  =  v[t  -  1]  (3) 

where  M,[f]  is  a  stochastic  row  vector  of  size  n  -  ip.  That  is,  M(/[f]  >  0,  for  1  <  j  <  n  —  xp,  and 
Tji<j<n-ip  Mjy[f]  =  l.2  By  "stacking"  (3)  for  different  i,  1  <  i  <  n  -  xp,  we  will  represent  the  Update 
step  of  Algorithm  1  at  all  the  fault-free  nodes  together  using  (4)  below. 

v[t]  =  M[f]  v[t  -  1]  (4) 

where  M[f]  is  a  (zz  -  xp)  X  (n  -  xp)  row  stochastic  matrix,  with  its  z’-th  row  being  equal  to  M,[f]  in  (3). 
M[f]  is  said  to  be  a  transition  matrix. 

In  the  rest  of  this  section,  we  will  first  "construct"  a  transition  matrix  M[f]  that  satisfies  certain 
desirable  properties.  Then,  we  will  identify  a  connection  between  the  transition  matrix  and  the 
sufficiency  condition  stated  above,  and  use  this  connection  to  establish  convergence  property  for 
Algorithm  1.  The  validity  property  also  follows  from  the  transition  matrix  representation. 

6.2  Construction  of  the  Transition  Matrix 

We  will  construct  a  transition  matrix  with  the  property  described  in  Lemma  1  below. 

Lemma  1  The  Update  step  of  Algorithm  1  at  the  fault-free  nodes  can  be  expressed  using  row  stochastic 
transition  matrix  M  [f],  such  that  there  exists  a  feasible  fault  set  Fx(i)  for  each  i  eT-F  such  that,  for  all 
j  e  {/}  u  (CVF  -  Fx(i))  n  Nr), 

2In  addition  to  t,  the  row  vector  M,[Z]  may  depend  on  the  state  vector  v[t  -  1]  as  well  as  the  behavior  of  the  faulty 
nodes  in  F.  For  simplicity  the  notation  M,[f]  does  not  explicitly  represent  this  dependence. 
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My[f]  >  / 3 

where  f  is  a  constant  (to  be  defined  later),  and  0  <  f  <  1. 

In  [13]  as  well,  we  construct  a  transition  matrix  to  prove  correctness  of  an  IABC  algorithm  under 
the  /-total  fault  model.  However,  the  generalized  fault  model  introduces  additional  complexity, 
which  is  handled  here  using  a  new  approach  to  construct  the  transition  matrix. 


Proof:  We  prove  the  correctness  of  Lemma  1  by  constructing  M;[f]  for  1  <  i  <  n  -  if>  that  satisfies 
the  conditions  in  Lemma  1.  Recall  that  F  is  the  set  of  faulty  nodes,  and  \F\  =  ip.  As  stated  before, 
without  loss  of  generality,  nodes  1  through  n  -  ip  are  assumed  to  be  fault-free,  and  the  remaining 
ip  nodes  faulty. 

Consider  a  fault-free  node  i  performing  the  Update  step  in  Algorithm  1.  In  the  Update  step, 
recall  that  the  smallest  fi  and  the  largest  /2  values  are  eliminated  from  rft],  where  the  choice  of  f\ 
and  /2  is  described  in  Algorithm  1.  Let  us  denote  by  S  and  X,  respectively,  the  set  of  nodes3  from 
whom  the  smallest  f\  and  the  largest  /2  values  were  received  by  node  i  in  iteration  t.  Define  sets 
Sg  and  Xg  to  be  subsets  of  S  and  X  that  contain  all  the  fault-free  nodes  in  S  and  X,  respectively. 
That  is,  Sg  =  S  n  (*V  -  F)  and  Xs  =  X  n  (*V  -  F). 

Construction  of  M,[f]  differs  somewhat  depending  on  whether  sets  Sg/Xg  and  N*[t]  n  F  are 
empty  or  non-empty.  We  divide  the  possibilities  into  6  separate  cases.  Due  to  space  limitation, 
here  we  present  the  construction  for  one  of  the  cases  (named  Case  I).  The  construction  for  the 
remaining  cases  is  presented  in  Appendix  B. 

In  Case  I,  Sg  i=-  O,  Xg  i1  O,  and  N*[f]  n  F  +  O.  Let  nig  and  ni£  be  defined  as  shown  below.  Recall 
that  the  nodes  in  Sg  and  Xg  are  all  fault-free,  and  therefore,  for  any  node  /  €  Sg  U  Xg,  Wj  =  v  ft  -  1] 
(in  the  notation  of  Algorithm  1). 


ms 


T,jeSg  Vj[t  ~  1] 


and  ni£ 


\-£g\ 


Now,  consider  any  node  k  e  N*[t ].  By  the  definition  of  sets  Sg  and  Xg,  <  Wg  <  ni£.  Therefore, 

we  can  find  weights  Sjt  >  0  and  Lj;  >  0  such  that  S/c  +  L/c  =  1,  and 


zvk 


Sk  ms  +  Lk  m£ 

Sk  V1  n 


1] 


(5) 

(6) 


Clearly,  at  least  one  of  Sk  and  Lk  must  be  >  1/2.  We  now  define  elements  M;/[f]  of  row 


•  For  j  e  N*[f]  n  (*V  -  F)  :  In  this  case,  j  is  either  a  fault-free  incoming  neighbor  of  i,  or  i  itself. 
For  each  such  j,  define  M;/[f]  =  an  This  is  obtained  by  observing  in  (2)  that  the  contribution 
of  such  a  node  /  to  the  new  state  vft]  is  a,  Wj  =  a,  vft  -  1  ]. 

The  elements  of  M;[f]  defined  here  add  up  to 

\N*[t]n(V-F)\ai 

3  Although  S  and  £,  may  be  different  for  each  t,  for  simplicity,  we  do  not  explicitly  represent  this  dependence  on  t  in 
the  notations  S  and 
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•  For  /  6  Sg  U  Xg  :  In  this  case,  j  is  a  fault-free  node  in  S  or  X- 
For  each  j  e  Sg, 

M„[»]  =  «,•  E  if] 

keN}[t]nF  1  gl 

and  for  each  node  j  E  Xg, 

M'-W  =“  E  j#r 

fceN*[f]nF  1  ^ 

To  obtain  these  two  expressions,  we  represent  value  sent  by  each  faulty  node  k  in  N*[t], 
i.e.,  k  e  N*[f]  n  F,  using  (6).  Recall  that  this  node  k  contributes  aflVj,  to  (2).  The  above  two 
expressions  are  then  obtained  by  summing  (6)  over  all  the  faulty  nodes  in  N*[t]  Hi  F,  and 
replacing  this  sum  by  equivalent  contributions  by  nodes  in  Sg  and  Xg. 

The  elements  of  M;[f]  defined  here  add  up  to 

ai  Yj  ( Sk  +  =  I N/ n  fl  a‘- 

keN'.[t\r\F 

•  For  j  e  ('V  -  F)  -  (N*[t]  U  Sg  U  Xg)  :  These  fault-free  nodes  have  not  yet  been  considered 

above.  For  each  such  node  j,  define  =  0. 

With  the  above  definition  of  M,[f],  it  should  be  easy  to  see  that  M,[f]  v[t  -  1]  is,  in  fact,  identical 
to  Vi[t ]  obtained  using  (2).  Thus,  the  above  construction  of  M,[f]  results  in  the  contribution  of  the 
faulty  nodes  in  N*[t]  to  (2)  being  replaced  by  an  equivalent  contribution  from  fault-free  nodes  in 
Xg  and  Sg. 


Properties  of  First,  we  show  that  M[f]  is  row  stochastic.  Observe  that  all  the  elements  of 

M;[f]  are  non-negative.  Also,  all  the  elements  of  M,[f]  above  add  up  to 


\N*[t]  n(<y-  F) I  ai  +  | N*[t]  n  F\  a{  =  |NT[f]|  a,  =  1 


because  fl/  =  l/|N*[f]|  as  defined  in  Algorithm  1.  Thus,  M, [ t]  is  a  stochastic  row  vector. 

Recall  that  from  the  above  discussion,  for  k  E  N*[f],  one  of  and  L/,  must  be  >  1/2.  Without 
loss  of  generality,  assume  that  Ss  >  1/2  for  some  s  €  N*[t]  n  F.  Consequently,  for  each  node  j  E  Sg, 
MJ([f]  >  jjyjSs  >  ^-f-  Also,  for  each  fault-free  node  j  in  N*[t],  M(/[f]  =  a,.  Thus,  if  fi  is  chosen  such 
that 


0  </3  < 


Ui 

2^ 


(7) 


and  Fx(i)  is  defined  to  be  equal  to  X,  then  the  condition  in  the  lemma  holds  for  node  i.  That  is, 
My[f]  >  j 3  for  j  E  {/}  U  (CVF  -  Fx(i))  n  Nr). 


All  Cases  Together:  Using  similar  constructions  in  other  cases  as  well  (presented  in  Appendix 
B)  and  a  suitable  choice  of  /J  (presented  in  Appendix  C),  we  can  obtain  a  row  stochastic  matrix 
M[f],  and  for  each  i  E  °Y  -  F  identify  a  feasible  fault  set  Fx(i),  such  that  M(/[f]  >  fi  for  all  j  E 
{/}  U  (FVf  -  Fx(i))  n  Nr).  Thus,  Lemma  1  can  be  proved  correct. 

□ 
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6.3  Validity  and  Convergence  of  Algorithm  1 

The  rest  of  the  proof  structure  is  derived  from  our  previous  work  wherein  we  proved  the  correctness 
of  an  IABC  algorithm  for  the  /-total  fault  model  [13].  Let  Rp  denote  the  set  of  all  the  reduced  graphs 
of  G(fV,&)  corresponding  to  a  feasible  fault  set  F.  Let  t  =  \Rp\.  z  depends  on  F  and  the  underlying 
network,  and  is  finite. 

In  this  discussion,  let  us  denote  a  reduced  graph  by  an  italic  upper  case  letter,  and  the  cor¬ 
responding  "connectivity  matrix"  (defined  below)  using  the  same  letter  in  boldface  upper  case. 
Thus,  H  denotes  the  connectivity  matrix  for  graph  H  £  Rp. 

Non-zero  elements  of  connectivity  matrix  H  are  defined  as  follows:  (i)  for  1  <  i,  j  <  n  —  ip, 
H ij  =  1  if  and  only  if  (/,  i)  £  H,  and  (ii)  H/(  =  1  for  1  <  i  <  n  -  i/».  That  is,  non-zero  elements  of  row 
H,  correspond  to  the  incoming  links  at  node  i,  and  the  self-loop  at  node  i.  Thus,  the  connectivity 
matrix  for  any  reduced  graph  in  Rp  has  a  non-zero  diagonal. 

Based  on  the  sufficient  condition  stated  at  the  start  of  Section  6  and  Lemma  1,  we  can  show  the 
following  key  lemmas.  The  proofs  are  presented  in  Appendix  D  and  E. 

Lemma  2  For  any  H  e  Rp,  has  at  least  one  non-zero  column. 

Lemma  3  For  any  t  >  1,  there  exists  a  graph  H  £  Rp  such  that  (!H  <  M[f]. 

Theorem  2  Suppose  that  G('V,£)  satisfies  the  sufficient  condition  stated  above.  Algorithm  1  satisfies  both 
the  validity  and  convergence  conditions. 

Proof:  A  complete  proof  is  presented  in  Appendix  F.  By  repeated  application  of  (4),  we  can 
represent  the  Update  step  of  Algorithm  1  at  the  f-th  iterations  (t  >  1)  as: 

v[t]  =  (n[=}M[i])  v[0]  (8) 

where  M[/]  is  constructed  as  described  above.  When  presenting  matrix  products,  for  convenience 
of  presentation,  we  adopt  the  following  convention:  for  a  <  b,  Ylbi=aA[i]  denotes  the  "backward" 
product  A[fa]A[fa  -  1]  •  •  •  A[a].  Thus,  Tk=1M[/]  in  (8)  above  represents  -  1]  •  •  •  M[l], 

Since  M[/]  is  row  stochastic,  then  from  (4),  it  follows  that  Algorithm  1  satisfies  the  validity 
condition.  Based  on  Lemmas  2  and  3,  we  can  also  show  that  the  rows  of  Tl|=1M[/]  become  identical 
in  the  limit  (as  elaborated  in  Appendix  F).  This  observation  and  (8)  together  imply  that  the  states 
of  the  fault-free  nodes  satisfy  the  convergence  condition  too.  □ 


7  Conclusions 

This  paper  considers  a  generalized  fault  model,  which  can  be  used  to  specify  more  complex  failure 
patterns,  such  as  correlated  failures  or  non-uniform  node  reliabilities.  Under  this  fault  model, 
we  prove  a  tight  necessary  and  sufficient  condition  for  the  existence  of  synchronous  iterative  ap¬ 
proximate  Byzantine  consensus  algorithms  in  arbitrary  directed  graphs.  The  analysis  of  consensus 
under  the  generalized  fault  model  sheds  new  light  on  how  the  fault  model  affects  algorithm  design. 
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APPENDIX 


A  Necessity  Proof  in  Section  4 

Now,  we  present  the  proof  for  Theorem  1.  The  proof  is  by  contradiction.  Let  us  assume  that  a 
correct  IABC  algorithm  exists,  and  for  some  feasible  fault  set  F,  and  feasible  sets  Fx(i)  for  each 
i  e*V  —  F,  the  resulting  reduced  graph  contains  two  source  components. 

Let  L  and  R  denote  the  nodes  in  the  two  source  components,  respectively.  Thus,  L  and  R  are 
disjoint  and  non-empty.  Let  C  =  (V  -  F  -  L  -  R)  be  the  remaining  nodes  in  the  reduced  graph.  C 
may  or  may  not  be  non-empty.  Let  us  now  assume  that  the  nodes  in  F  (if  non-empty)  are  all  faulty, 
and  all  the  nodes  in  L,  R,  and  C  (if  non-empty)  are  fault-free. 

Consider  the  case  when  (i)  each  node  in  L  has  initial  input  m  ,  (ii)  each  node  in  R  has  initial 
input  M,  such  that  M  >  zzz,  and  (iii)  each  node  in  C  (if  non-empty)  has  an  input  in  the  interval 
[m,M]. 

In  the  Transmit  step  of  iteration  1  of  the  IABC  algorithm,  suppose  that  the  faulty  nodes  in  F 
(if  non-empty)  send  m~  <  m  on  outgoing  links  to  nodes  in  L,  send  M+  >  M  on  outgoing  links 
to  nodes  in  R,  and  send  some  arbitrary  value  in  interval  \m,  M]  on  outgoing  links  to  nodes  in 
C  (if  non-empty).  This  behavior  is  possible  since  nodes  in  F  are  Byzantine  faulty.  Note  that 
m~  <  m  <  M  <  M+.  Each  fault-free  node  k  e*V  -  F  sends  to  nodes  in  N+  value  <y  [0]  in  iteration  1. 

Consider  any  node  i  E  L.  Since  L  is  a  source  component  in  the  reduced  graph,  it  must  be  true 
that  Nr  n  (C  U  R)  c  Nr  n  Fx(i)  n  'W-4 

Now,  node  i  receives  m~  from  the  nodes  in  Nr  n  F,  and  values  in  [zzz,M]  from  the  nodes  in 
Nr  n  (C  U  R),  and  m  from  the  nodes  in  {/}  U  (Nr  n  L).  Figure  1  illustrates  the  behavior  of  faulty 
nodes  in  F  and  the  value  received  by  node  i. 

Consider  the  following  two  cases: 

•  Nr  n  F  and  Nr  n  (C  U  R)  are  both  non-empty:  In  this  case,  (Nr  nf)  c  F  and  Nr  n(CU 
R)  =  Nr  n  Fx(i)  n  'W  c  Fx(i).  From  node  z's  perspective,  consider  two  possible  scenarios: 

(a)  nodes  in  Nr  n  F  are  all  faulty,  and  the  other  nodes  are  fault-free,  and  (b)  nodes  in 
Nr  n  (C  U  R)  =  Nr  n  Fx(i)  n  'Vr  are  all  faulty,  and  the  other  nodes  are  fault-free.  Note  that, 
since  Fx(i)  is  a  feasible  fault  set.  Nr  n  Fx(i)  n  rVp  is  also  a  feasible  fault  set.  Similarly,  since  F 
is  a  feasible  fault  set.  Nr  n  F  is  also  a  feasible  fault  set. 

In  scenario  (a),  from  node  z's  perspective,  the  fault-free  nodes  have  sent  values  in  interval 
[zzz,M],  whereas  the  faulty  incoming  neighbors,  i.e.,  nodes  in  Nr  n  F,  have  sent  value  m~. 
According  to  the  validity  condition,  zy[l]  >  zzz.  On  the  other  hand,  in  scenario  (b),  the  fault- 
free  incoming  neighbors  have  sent  values  m~  and  zzz,  where  m~  <  m ;  so  u\\  ]  <  zzz,  according 
to  the  validity  condition.  Since  node  i  does  not  know  whether  the  correct  scenario  is  (a)  or 

(b) ,  it  must  update  its  state  to  satisfy  the  validity  condition  in  both  cases.  Thus,  it  follows 
that  Vi[  1]  =  zzz. 

4Explanation:  In  the  reduced  graph,  there  are  no  incoming  links  at  i  from  nodes  in  Nr  n  (C  U  R).  Thus,  any  incoming 
links  in  £  from  the  nodes  in  Nr  n  (C  U  R)  must  have  been  removed  when  constructing  £f  for  the  reduced  graph.  Recall 
that  when  constructing  £f,  incoming  links  from  nodes  in  Nr  n  Fx(i)  n  TV  are  removed.  It  should  be  noted  that  the 
algorithm  is  performed  using  the  links  in  £,  not  the  reduced  graph.  Thus,  in  the  Transmit  step,  all  links  in  £  are  used. 
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Figure  1:  Illustration  of  the  behavior  of  faulty  nodes  in  F  and  the  value  received  at  node  i. 

•  At  most  one  of  NT  D  F  and  NT  n(CUR)  is  non-empty:  Recall  that  NT  n  F  and  NT  n(CUR)  = 
Nt  n  Fx(i)  n  rV[-  are  both  feasible  fault  sets.  Since  at  least  one  of  these  two  sets  is  empty,  their 
union,  i.e.,  (NT  nf)  U  (NT  n(CU  R)),  is  also  a  feasible  fault  set. 

Then,  from  node  i's  perspective,  it  is  possible  that  all  the  nodes  in  (NT  n  F)  U  (NT  n(CU  R)) 
are  faulty,  and  the  rest  of  the  nodes  are  fault-free.  In  this  situation,  the  values  sent  to  node  i 
by  the  fault-free  nodes  (which  are  all  in  {/}  U  (NT  n  L))  are  all  m,  and  therefore,  u,[  1]  must  be 
set  to  m  as  per  the  validity  condition. 

Hence,  u,[  1]  =  m  for  each  node  i  £  L.  Similarly,  we  can  show  that  Vj\  1]  =  M  for  each  node  j  £  R. 

Now  consider  the  nodes  in  set  C  (if  non-empty).  All  the  values  received  by  the  nodes  in  C  are 
in  [m,M],  therefore,  their  new  state  must  also  remain  in  [ m,M ],  as  per  the  validity  condition. 

The  above  discussion  implies  that,  at  the  end  of  iteration  1,  the  following  conditions  hold  true: 
(i)  state  of  each  node  in  L  is  m  ,  (ii)  state  of  each  node  in  R  is  M,  and  (iii)  state  of  each  node  in  C  (if 
non-empty)  is  in  the  interval  [m,M\.  These  conditions  are  identical  to  the  initial  conditions  listed 
previously.  Then,  by  a  repeated  application  of  the  above  argument  (proof  by  induction),  it  follows 
that  for  any  t  >  0,  u,[f]  =  m  for  all  nodes  i  £  L,  i ?j[t]  =  M  for  all  nodes  j  £  R  and  u^[f]  £  [ m,M\  for  all 
nodes  k  £  C. 

Since  L  and  R  both  contain  fault-free  nodes,  and  m  y  M,  the  convergence  requirement  is  not 
satisfied.  This  is  a  contradiction  to  the  assumption  that  a  correct  iterative  algorithm  exists  in 
G(%£). 
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B  Construction  for  other  Cases  in  Section  6.2 


When  discussing  Case  I  in  Section  6.2,  we  deferred  discussion  of  the  other  cases.  We  present  the 
construction  for  the  rest  of  the  cases  here.  There  are  six  cases  in  total: 

•  Case  I:  Sg  ±  O, Xg  ±  O, and  N*[t] 

•  Case  II:  Sg  +  O,  Xg  ±  O,  and  N*[t]  fi  F  =  O. 

•  Case  III:  Sg  =  O,  Xg  ±  O,  and  N*[t ]  C\F  O. 

•  Case  IV:  Sg  +  O, Xg  =  O,  and  N*[t] 

•  Case  V:  Sg  =  O,  Xg  =  O, and  N*[t]  nF^O. 

•  Case  VI:  at  most  one  of  Sg  and  Xg  is  non-empty,  and  N*[t]  (~)F  =  O. 

Note  that  the  choice  of  f\  and  /2  in  Algorithm  1  ensures  that  the  value  from  node  i  itself  is  never 
dropped  from  r,[f];  therefore,  i  €  N*[f],  and  N*[t]  is  always  non-empty. 

B.l  Case  II 

Now,  we  consider  the  case  when  Sg  F  O,  £g  F  O,  and  N*[t]  n  F  =  Q.  That  is,  when  each  of  S  and  X 
contains  at  least  one  fault-free  node,  and  N*[t]  contains  only  fault-free  node(s).  In  fact,  the  analysis 
of  Case  II  is  very  similar  to  the  analysis  presented  in  Section  6.2  for  Case  I  when  N*[t ]  does  contain 
a  faulty  node. 

We  now  discuss  how  the  analysis  of  Case  I  can  be  applied  to  Case  II.  Rewrite  (2)  as  follows: 

Vi[t]  =  ^Vi[t  -  1]  +  ^Vi[t  -  1]  +  ^  aflVj  (9) 

=  aflvz  +  ciiivi  +  ^  aflVj  (10) 

In  the  above  equation,  z  is  to  be  viewed  as  a  "virtual"  incoming  neighbor  of  node  i,  which 
has  sent  value  zvz  =  ^  to  node  i  in  iteration  t.  With  the  above  rewriting  of  state  update,  the 

value  received  by  node  i  from  itself  should  be  viewed  as  zvj  =  hlCll  instead  of  Vj[t  -  1],  With  this 
transformation.  Case  II  now  becomes  identical  to  Case  I,  with  virtual  node  z  being  treated  as  an 
incoming  neighbor  of  node  i. 

In  essence,  a  part  of  node  i's  contribution  (half,  to  be  precise)  is  now  replaced  by  equivalent 
contribution  by  nodes  in  Xg  and  Sg.  We  now  define  elements  M(/[f]  of  row  M/[f]: 

•  For  j  =  i:  M;/[f]  =  j.  This  is  obtained  by  observing  in  (2)  that  node  i's  contribution  to  the 
new  state  vt[t]  is  a,-- ^1. 
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•  For  j  £  N*[f]  —  {/} :  In  this  case,  j  is  a  fault-free  incoming  neighbor  of  i.  For  each  such  j,  define 
=  cij.  This  is  obtained  by  observing  in  (2)  that  the  contribution  of  node  j  to  the  new 
state  Vj[t]  is  a,Wj  =  a(Vj[t  -  1]. 


•  For  j  e  SgU  :  In  this  case,  /  is  a  fault-free  node  in  S  or  X. 
For  each  j  £  Sg, 

M  ij[t] 


Cl;  S 2 


‘-’Z 

2 


and  for  each  node  j  e  £.g, 


M„[f]  =  %  L'~ 


2  |X„ 


where  Sz  and  Lz  are  chosen  such  that  Sz  +  Lz  =  1  and  wz  =  Vl^~^  =  +  y7H£-  Note 

that  such  Sz  and  Lz  exist  because  by  definition  of  S,,  and  Vj[t  -  1]  >  Wj,  V/  £  S,,  and 
Vj[t  -  1]  <  Wj,  V/  £  Lg.  Then  the  two  expressions  above  are  obtained  by  replacing  the 
contribution  of  the  virtual  node  z  by  an  equivalent  contribution  by  the  nodes  in  Sg  and  X?, 
respectively 

•  For  /  £  ('V  -  F)  -  (N*[t]  U  S,,  U  X?)  :  These  fault-free  nodes  have  not  yet  been  considered 
above.  For  each  such  node  j,  define  M(/[f]  =  0. 


By  argument  similar  to  that  in  Section  6.2,  Mff]  is  row  stochastic.  Without  loss  of  generality, 
suppose  that  Sz  >  1/2.  Then  for  each  node  j  £  Sg,  M(/[f]  =  ^y-jSr  >  -p-j.  Also,  for  fault-free  node 

j  in  N*[t]  -  {/},  M;y[f]  =  a,,  and  M„[f]  =  j.  Recall  that  by  definition,  |N,s,|  >  1.  Hence,  if  fi  is  chosen 
such  that 


0<^4^  (11) 

and  Fx(i)  is  defined  to  be  equal  to  X,  then  the  condition  in  the  Lemma  1  holds  for  node  i.  That  is, 
M ij[t]  >  P  for  j  £  {/}  U  <yF  -  Fx(i))  n  Nr. 

B.2  Cases  III  and  IV 

Now,  we  describe  the  construction  of  Case  III.  The  construction  for  Case  IV  is  very  similar,  and 
thus,  is  omitted  here. 

In  Case  III,  Sg  =  (l>,  X<>  F  O,  and  N*[t]  n  F  F  O.  Thus,  S  does  not  contain  any  fault-free 
nodes  (hence  Sg  is  empty).  This  may  be  due  to  one  of  the  following  two  reasons:  (i)  the  set  S  is 
non-empty,  but  all  the  nodes  in  S  are  faulty,  or  (ii)  set  S  is  empty. 

Assume  that  /  £  X  is  a  fault-free  node,  and  that  all  the  nodes  in  S  are  faulty  (i.e.,  Sg  =  O)  or  that 
S  is  empty  (i.e.,  f\  =  0).  In  this  case,  observe  that  node  D(/i  +  1)  must  be  fault-free  (otherwise,  f\ 
cannot  be  the  largest  value  as  defined  in  Algorithm  1).  Now,  consider  any  node  k  £  N*[t],  Similar 
to  the  argument  in  Case  I,  we  can  find  weights  >  0  and  >  0  such  that 

Sk  +  Lk  =  l 
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and 


(12) 


ivk  =  Sk  vD{fl+1  >[f  -  1]  +  Lk  vi[t  -  1] 

We  now  define  M/?[f]  for  all  fault-free  j. 

•  For  j  e  (N*[f]  -  {D(f\  +  1)})  n  CV  -  F).  That  is,  /  is  a  fault-free  node  in  N*[t]  with  the  exception 
ofD(/i  +  l). 

For  each  such  j,  define  =  at.  This  is  obtained  by  observing  in  (2)  that  the  contribution 

of  node  j  to  the  new  state  vt[t]  is  a,Wj  =  a,  Vj[t  -  1], 

The  elements  of  M;[f]  defined  here  (including  the  case  of  j  =  i)  add  up  to 

(\N*[t]  IT  ('V  -F)\  -  1)  a, 

•  For  nodes  D(/i  +  1)  and  l:  Define 


and 


=  A;  +  ^  Sk 

keN'[t]r\F 


Miz[f]  —  Lk 

ZceN*[f]nF 


Similar  to  Case  I  presented  in  Section  6.2,  these  two  expressions  are  obtained  by  summing 
up  the  contribution  over  the  faulty  nodes  in  N*[f],  and  replacing  the  sum  by  an  equivalent 
contribution  by  the  nodes  D(/i  +  1)  and  /,  respectively,  according  to  (12). 


The  above  elements  of  M,[f]  add  up  to 

(  \ 

1  +  ^  (Sk  +  Lk) 


keN*[t]rF 


(1  +  \N*[t]  nF|)  ah 


•  For  j  €  CV  —  F)  —  (N*[f]  U  {/}):  These  fault-free  nodes  have  not  yet  been  considered  above.  For 
each  such  /,  define  M;y[f]  =  0. 


Similar  to  Case  I,  in  Case  III  as  well,  it  should  be  easy  to  see  that 

Mj[f]  v[t  -  1] 

is  identical  to  vt[t]  obtained  using  (2). 


Properties  of  All  the  elements  of  M,[f]  are  non-negative.  The  elements  of  M;[f]  defined  in 

Case  II  add  up  to 

(\N*[t ]  rCV-  F) |  -  1)  a,  +  (1  +  | N*[t]  n  F\)  a,  =  |N*[f]|  a,  =  1 
Thus,  Mj[f]  is  a  stochastic  row  vector. 

In  Case  III,  recall  that  for  any  fault-free  node  j  in  N*[t]  (including  j  =  D(/i  +  1)  and  j  =  i), 
My[f]  >  a-,.  Thus,  if  ft  is  chosen  such  that 

0  <  j 3  <  flt-  (13) 

and  Fx(i)  is  defined  to  be  equal  to  X,  then  the  condition  in  the  Lemma  1  holds  for  node  i. 
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B.3  Case  V 


Consider  Case  V,  where  N*[t]  n  F  ^  <I>,  and  Sg  =  -Cg  =  O.  In  this  case,  it  should  be  easy  to  see 
that  N*[t]  contains  at  least  3  nodes.  In  particular,  Df1+ 1  must  be  fault-free  (otherwise,  f\  cannot 
be  maximum  possible),  D\N-\_j2+1  must  be  fault-free  (otherwise,  /2  cannot  be  maximum  possible), 
and  there  is  a  faulty  node  in  N?[f], 

Now  this  case  can  be  handled  similar  to  Case  III  analyzed  above.  In  particular,  entries  in  M;[f] 
are  defined  similarly  with  l  being  defined  equal  to  DN-_/2+ 1-  Also,  define  Fx(i)  =  O. 

Hence,  it  is  easy  to  see  that  the  properties  of  M,[f]  are  identical  to  Case  III  presented  above. 


B.4  Case  VI 

Here,  we  consider  the  case  when  at  most  one  of  S  and  X  contains  a  fault-free  node  and  N*[t]f)F  =  O. 
Without  loss  of  generality,  suppose  that  S  contains  only  faulty  nodes,  and  X  may  contain  a  fault-free 
node. 

In  this  case,  define  M;y[f]  =  fl,  for  /  e  N*[f];  define  M(/-  =  0  for  all  other  fault-free  nodes  /.  Also, 
define  Fx(i)  =  X. 

The  properties  of  M,[f]  thus  defined  are  identical  to  Case  III  above. 


C  Putting  Cases  Together 

Now,  let  us  consider  Cases  I-VI  together.  From  the  definition  of  a,  in  Algorithm  1,  observe  that 
a,-  >  |Ni|+1  (because  /i,/2  >  0).  Let  us  define 


a  =  mm  — — 

ie^v  |  N. 


+  1 


Moreover,  observe  that  |dy?|  <  n  and  |X7|  <  n.  Then  define  (i  as 


f  m  Tn  (14> 

This  definition  satisfies  constraints  on  / 3  in  Cases  I  through  VI  (conditions  (7),  (11)  and  (13)).  Thus, 
Lemma  1  holds  for  all  six  cases  with  this  choice  of  / 3. 


D  Proof  of  Lemma  2  in  Section  6.3 


Here,  we  present  the  proof  of  the  first  key  lemma  used  in  the  sufficiency  proof. 
Lemma  2  For  any  H  e  Rf ,  has  at  least  one  non-zero  column. 
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Proof:  G(*V,£)  satisfies  the  sufficient  condition  stated  at  the  start  of  Section  6.  Therefore,  there 
exists  at  least  one  non-faulty  node  k  in  the  reduced  graph  H  that  has  directed  paths  to  all  the  nodes 
in  H  (consisting  of  the  edges  in  H).  Since  the  length  of  the  path  from  k  to  any  other  node  in  H  is  at 
most  n  -  if  -  1,  the  k- th  column  of  matrix  will  be  non-zero.5  □ 

E  Proof  of  Lemma  3  in  Section  6.3 

Here,  we  present  the  proof  of  the  second  key  lemma  used  in  the  sufficiency  proof.  We  start  with 
two  definitions: 

Definition  5  For  matrices  A  and  B  of  identical  size,  and  a  scalar  y,  yB  <  A  provided  that  yBy  <  Ay  for 
all  i,  j. 

We  want  to  prove  the  following  lemma. 


Lemma  3  For  any  t  >  1,  there  exists  a  graph  H  E  Rp  such  that  jSH  <  M[f], 


Proof:  Observe  that  the  /-th  row  of  the  transition  matrix  M[f]  corresponds  to  the  state  update 
(in  Algorithm  1)  performed  at  fault-free  node  /.  Recall  from  Lemma  1  that  >  f  for  j  € 

{/}  U  MM  -  Fx(i))  n  Nf),  where  Fx(i)  is  a  feasible  fault  set. 

Let  us  obtain  a  reduced  graph  Ft  by  choosing  Fx(i)  for  each  /  as  defined  in  Lemma  1.  Then  from 
the  definition  of  connectivity  matrix  H,  Lemma  3  then  follows.  □ 


F  Correctness  of  Algorithm  1 

When  presenting  matrix  products,  for  convenience  of  presentation,  we  adopt  the  following  con¬ 
vention:  for  a  <  b,  n|'=(A[/]  denotes  the  "backward"  product  A [ h ]  A [ T  -  1]  •  •  •  A[a]. 

The  proof  below  is  similar  to  a  proof  for  the  /-total  fault  model  in  our  previous  work  [13].  It  is 
included  here  for  the  convenience  of  the  referees. 


F.l  Matrix  Preliminaries 

In  the  discussion  below,  we  use  boldface  upper  case  letters  to  denote  matrices,  rows  of  matrices, 
and  their  elements.  For  instance,  H  denotes  a  matrix,  H,  denotes  the  /-th  row  of  matrix  H,  and  H y 
denotes  the  element  at  the  intersection  of  the  /-th  row  and  the  j- th  column  of  matrix  H. 

Definition  6  A  vector  is  said  to  be  stochastic  if  all  the  elements  of  the  vector  are  non-negative,  and  the 
elements  add  up  to  1.  A  matrix  is  said  to  be  row  stochastic  if  each  row  of  the  matrix  is  a  stochastic  vector. 

5That  is,  all  the  elements  of  the  column  will  be  non-zero.  Also,  such  a  non-zero  column  will  exist  in  H too.  We 
use  the  loose  bound  of  n  —  ip  to  simplify  the  presentation. 
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For  a  row  stochastic  matrix  A,  coefficients  of  ergodicity  6(A)  and  A  (A)  are  defined  as  follows 


[15]: 


6(A)  =  max  max  |A(1 ;  -  A,2  ,| 

j  k,h 

A  (A)  =  1  -  min  V  min(AI1  y ,  A,2  ■) 

1 

It  is  easy  to  show  that  0  <  6(A)  <  1  and  0  <  A  (A)  <  1 ,  and  that  the  rows  of  A  are  all  identical  if  and 
only  if  6(A)  =  0.  Also,  A(A)  =  0  if  and  only  if  6(A)  =  0. 

The  next  result  from  [6]  establishes  a  relation  between  the  coefficient  of  ergodicity  6(-)  of  a 
product  of  row  stochastic  matrices,  and  the  coefficients  of  ergodicity  A(-)  of  the  individual  matrices 
defining  the  product. 


Lemma  4  For  any  p  square  row  stochastic  matrices  Q(l),  Q(2), . . .  Q(p), 

6(Q(p)Q(p  -  1)  •  •  •  Q(l))  <  rf  1  A(Q(/)). 

Lemma  4  is  proved  in  [6].  It  implies  that  if,  for  all  i,  A(Q (/))  <  1  —y  for  some  y,  where  0  <  y  <  1, 
then  6(Q(p)Q(p  -  1)  •  •  •  Q(l))  will  approach  zero  as  p  approaches  oo.  We  now  define  a  scrambling 
matrix  [6, 15]. 


Definition  7  A  roiv  stochastic  matrix  H  is  said  to  be  a  scrambling  matrix  if  A(H)  <  1. 

The  following  lemma  follows  easily  from  the  above  definition  of  A(- ). 

Lemma  5  If  any  column  of  a  row  stochastic  matrix  H  contains  only  non-zero  elements  that  are  all  lower 
bounded  by  some  constant  y,  where  0  <  y  <  1,  then  H  is  a  scrambling  matrix,  and  A(H)  <  1  -  y. 


F.2  Correctness  of  Algorithm  1 

Lemma  6  For  any  z  >  1,  in  the  product  belozv  ofH[t]  matrices  for  consecutive  t(/z  -  i/>)  iterations,  at  least 
one  column  is  non-zero. 

nz+T (n-rpyi 

Proof:  Since  the  above  product  consists  of  t (n  - 1/>)  connectivity  matrices  corresponding  to  graphs 
in  Ry-,  at  least  one  of  the  connectivity  matrices  corresponding  to  the  t  distinct  graphs  in  IFf,  say 
matrix  H* ,  will  appear  in  the  above  product  at  least  n  -  ij>  times. 

Now  observe  that:  (i)  By  Lemma  2,  H"  'i  contains  a  non-zero  column,  say  the  k- th  column 
is  non-zero,  and  (ii)  all  the  H[f]  matrices  in  the  product  contain  a  non-zero  diagonal.  These  two 
observations  together  imply  that  the  k- th  column  in  the  above  product  is  non-zero.  □ 


Let  us  now  define  a  sequence  of  matrices  Q (/),  i  >  1,  such  that  each  of  these  matrices  is  a 
product  of  t (n  -  i/>)  of  the  M[f]  matrices.  Specifically, 


Q(0 


f=(!-l)r(n-i/’)+l 


M[f] 


(15) 
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From  (8)  and  (15)  observe  that 


v[kz(n  -  xjt)]  =  ( nf=1  Q(i) )  u[0] 
Lemma  7  For  i  >  1,  Q (?)  is  a  scrambling  row  stochastic  matrix,  and 

A(Q(i))  <  1  -  1 3T(n“^. 


(16) 


Proof:  Q(/)  is  a  product  of  row  stochastic  matrices  (Mff]);  therefore,  Q(z)  is  row  stochastic.  From 
Lemma  3,  for  each  t  >  1, 

/3H[t]  <  M[f] 


Therefore, 


pT(n~v  n 


iz(n-ip) 

t=(i-l)T(n-xp)+l 


H[f] 


n/T(n- ip) 

—  f=(!-l)r(n-i/’)+l 


M[f]  =  Q(0 


By  using  z  =  (i  -  1  )(n  -  i/>)  +  1  in  Lemma  6,  we  conclude  that  the  matrix  product  on  the  left  side  of 
the  above  inequality  contains  a  non-zero  column.  Therefore,  Q (?)  on  the  right  side  of  the  inequality 
also  contains  a  non-zero  column. 


Observe  that  t (n  -  i/’)  is  finite,  and  hence,  /h  ■"  ^  is  non-zero.  Since  the  non-zero  terms  in  H[f] 
matrices  are  all  1,  the  non-zero  elements  in  Fl^!_^j;_^+1H[f]  must  each  be  >  1.  Therefore,  there 

exists  a  non-zero  column  in  Q (?)  with  all  the  elements  in  the  column  being  >  Therefore,  by 

Lemma  5,  A(Q(/))  <  1  —  ^n~4’) r  anj  Q(/)  js  a  scrambling  matrix.  □ 


Theorem  2  Suppose  that  G(^V,S)  satisfies  the  sufficient  condition  stated  above.  Algorithm  1  satisfies  both 
the  validity  and  convergence  conditions. 


Proof:  Since  v[t]  =  M[f]  v[t  -  1],  and  M[f]  is  a  row  stochastic  matrix,  it  follows  that  Algorithm  1 
satisfies  the  validity  condition. 

Using  Lemma  4  and  the  definition  of  Q (/),  and  using  the  inequalities  A(M[f])  <  1  and  A(Q(/))  < 

(1  -  <  \r  we  ge|. 


lim  6(n|=1M[/])  =  limb 

t — >oo  1  1  t — XX) 


TV. 


HL^JMk-O+i 


i _ t _ | 

L  J  - 


m[/]  n;:pjQ(o 


<  limll 

t — >oo 


I _ t _ I 

L  t  {n—if) J 

i= 1 


A(Q(0)  =  0 


Thus,  the  rows  of  Tl(=1M[;]  become  identical  in  the  limit.  This  observation,  and  the  fact  that 
u[f]  =  (IT*  M[i'M0]  together  imply  that  the  states  of  the  fault-free  nodes  satisfy  the  convergence 
condition.  □ 
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